Learning for Text Categorization and Information Extraction with ILP

نویسندگان

  • Markus Junker
  • Michael Sintek
  • Matthias Rink
چکیده

Text Categorization (TC) and Information Extraction (IE) are two important goals of Natural Language Processing. While handcrafting rules for both tasks has a long tradition, learning approaches gained much interest in the past. In the present paper we try to provide a solid basis for the application of ILP methods to these learning problems. We propose to introduce three basic types (namely a type for text, one for words and one for text positions) and three simple predicate deenitions over these types which enable to write text categorization and information extraction rules as logic programs. Based on the proposed representation, we present the key concepts of our approach to the problem of learning rules for TC and IE in terms of ILP. We conclude the paper by comparing our approach of representing texts and rules as logic programs to others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Semantic-Level Information Extraction Rules by Type-Oriented ILP

This paper describes an approach to using semantic representations for learning information extraction (IE) rules by a type-oriented inductive logic programming (ILP) system. NLP components of a machine translation system are used to automatically generate semantic representations of text corpus that can be given directly to an ILP system. The latest experimental results show high precision and...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Applying Type-Oriented ILP to IE Rule Generation

This paper describes our approach to applying typeoriented inductive logic programming (ILP) to information extraction (IE) tasks and the latest experimental results in learning IE rules from the data generated from 100 newspaper articles. Information extraction involves extracting key information from text corpus in order to ll empty slots of given templates. A bottle neck in building IE syste...

متن کامل

Learning to Classify English Text with Ilp Methods

Text categorization is the task of classifying text into one of several pre-deened categories. In this paper we will evaluate the eeectiveness of several ILP methods for text categorization, and also compare them to their propositional analogs. The methods considered are FOIL, the propositional rule-learning system RIPPER, and a rst-order version of RIPPER called FLIPPER. We show that the benee...

متن کامل

A Systematic study of Text Mining Techniques

Text mining is a new and exciting research area that tries to solve the information overload problem by using techniques from machine learning, natural language processing (NLP), data mining, information retrieval (IR), and knowledge management. Text mining involves the pre-processing of document collections such as information extraction, term extraction, text categorization, and storage of in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999